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Abstract 


The financial services industry perpetually processes an over- 
whelming amount of complex data. Digital reports are often 
created based on tedious manual analysis as well as visualiza- 
tion of the underlying trends and characteristics of data. Of- 
ten, the accruing costs of human computation errors in creat- 
ing these reports are very high. We present DocuBot, a novel 
Al-powered virtual assistant for creating and modifying con- 
tent in digital documents by modeling natural language in- 
teractions as “skills” and using them to transform underly- 
ing data. DocuBot has the ability to agglomerate saved skills 
for reuse, enabling humans to automatically generate recur- 
rent reports. DocuBot also has the capability to continuously 
learn domain-specific and user-specific vocabulary by inter- 
acting with the user. We present evidence that DocuBot adds 
value to the financial industry and demonstrate its impact with 
experiments involving real and simulated users tasked with 
creating PowerPoint presentations. 


Background 


As part of their daily business operations, employees in 
the financial services industry often process large numerical 
data-sets and generate a variety of recurrent reports, includ- 
ing PowerPoint decks. The reports allow the employees to 
transform complex financial data such as cash-flows, client 
transactions, stock prices, market risk conditions, etc. into 
accessible visualizations and presentations. The data is of- 
ten in time-series format, and many reports have a relatively 
consistent structure, with a need for recurrent updates on an 
annual, quarterly, monthly, weekly, or daily basis. The data 
can be overwhelming in the financial services industry, with 
companies like JPMorgan Chase processing more than $1 
Trillion in year-to-date volume for merchant clients in 2016 
(JPMorgan 2016), and generating upwards of 8 million Pow- 
erPoint slides every year!. As a result the report-generation 
process is often tedious, time-consuming, error-prone, and 
subject to complicated and costly controls. 

Recent advances in data-to-text generation have proven 
promising, but human-level performance remains a chal- 
lenge (Shen et al. 2020; Gatt and Krahmer 2017; Chen et al. 
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2020). Furthermore, studies in this domain often focus on 
open-ended relational data rather than numerical data, and 
the output is often text-only commentary paired with no vi- 
sualizations. For the use cases specific to the financial do- 
main, commercial solutions such as Narrative Science and 
Yseop have been developed. These solutions allow users 
to set up sophisticated configurations that can be reused 
to generate new reports based on pre-existing templates. 
While configurability is helpful, a more flexible solution— 
one which could learn from direct interactions with end 
users—would be more preferable, easier to scale, and less 
challenging to generalize to new use cases. 

In this paper we introduce DocuBot, a novel framework 
to automate the generation of digital reports—specifically 
PowerPoint slides—in a real-world setting through human- 
AI interaction. DocuBot provides the ability to create and 
modify content in PowerPoint presentations through natural 
language instructions, with the capability to adapt and im- 
prove by learning from experience through interactions with 
the user. DocuBot also has the capability to automatically 
generate “insights” which are natural language explanations 
of data and content displayed on the slides in these pre- 
sentations. Users can utilize this framework and create dig- 
ital presentations effortlessly, by issuing natural-language 
instructions to DocuBot, similar to a conversational vir- 
tual assistant. However despite many other virtual assistants, 
DocuBot’s tasks are more structured and targeted. This mo- 
tivates the use of automation methods that can leverage busi- 
ness logic and structural constraints, while also maintaining 
flexibility and expressivity. In this regard, DocuBot’s novel 
innovations include: 


e A simple CRF-based parser that identifies major concepts 
in user commands and makes sense of them within the 
boundaries of a what a typical business user would need. 


e An adaptive Knowledge Base that learns domain-specific 
or user-specific lingo via dynamic learning-and-forgetting 
mechanisms. 


e A simulated user study that models synthetic user agents 
with varying degrees of predictability, cooperativeness, 
and diversity in their vocabulary. The study allows us to 
benchmark the performance of two Knowledge Represen- 
tation approaches against each other and measure the im- 
pact of various user behaviors on DocuBot’s performance. 


e An end-to-end pipeline designed along the common busi- 
ness need for automated generation of reports and presen- 
tations based on financial data. 


The following section describes DocuBot’s architecture 
and each component in detail. The remaining sections de- 
scribes how each component is evaluated or stress tested. We 
also present illustrations of user interactions with DocuBot 
and its output, and qualitative user feedback on the time 
saved by using DocuBot to generate PowerPoint slide decks. 


Components 
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Figure 1: DocuBot’s architecture and components. 


In this section, we present how DocuBot 1s structured and 
describe each component in detail. 


User Interface 


Any user-facing AI assistant needs a way to communicate 
with users, receive commands, ask clarifying questions, and 
produce proper answers. The user interacts with DocuBot 
through natural language instructions to create as well as 
modify digital presentations. The interface enables DocuBot 
to clarify any existing ambiguity present in the instruc- 
tions as well as learn domain-specific vocabulary over time 
through interactions with users. 

In a later section, we demonstrate through screenshots of 
a working demo how a user interacts with DocuBot to create 
as well as modify content for PowerPoint slides. 


Parser 


To understand and execute requests, DocuBot requires an 
intent-aware parser that is able to process and transform user 
commands into relevant actions. Since DocuBot targets a 
specific task, the intents can be restricted to a closed set like 
updating client A’s slides using client B’s data, or generat- 
ing a pie chart using 2019 data in “data.csv’. Working back- 
wards from the intents allows to identify the main pieces 
of information that DocuBot will need to perform each in- 
tended task. These pieces, known as “concepts”, include’: 


*The deployed version of DocuBot covers multiple additional 
concepts such as client name, period, and type of analysis. These 
concepts have not been described in the section because they were 
not included in the experiments at the time of writing this paper. 


Table 1: High-level features used by the parser to identify 
concepts in each command. 


Example for 


Features “piechart” 
Part-of-Speech Noun 

POS tag of previous —_ Proposition, 

and next word Verb 

First and last letters  “p’, “t” 

Word with truncated os : 
first and last letters eEnAIE Piece 

2- and 3-letter “pie”, “char”, “t’, “pr’, 
subwords eC. “ha rae 1 


e action: such as creating or updating a deck. 


data: the dataset to be used as the source. 


object: the type of graphic to be generated such as a 
piechart. 


@ presentation: the name of the resulting presentation. 


With the above schema, the task of parsing user com- 
mands is reduced to identifying the above concepts in each 
command, to understand human instructions. In other words 
the parser needs to transform “Please create a Piechart using 
Energy data and add it in the weekly report.” into “Please 
[create]action a [Piechart|onject USINg [Energy] data data 
and add it in [weeklyreport| presentation - 

This can be treated as a tagging problem with relatively 
low lexical diversity (because there are not many ways in 
which a piechart can be mentioned). Our solution uses a 
CRF tagger trained on 50 natural language commands. The 
commands were curated from users across J.P. Morgan and 
annotated manually for concepts. We used the NLTK library 
(Bird, Klein, and Loper 2009) to assign Part-of-Speech tags 
to the commands. Each token was represented as a feature 
vector by concatenating the features in Table 1. Next, we 
trained a Conditional Random Field (Lafferty, McCallum, 
and Pereira 2001; Sha and Pereira 2003; Sutton and Mc- 
Callum 2012) model as implemented in CRFsuite (Okazaki 
2011), and called through the python-crfsuite package (Ko- 
robov et al. 2018). Despite the small size of the training data, 
the resulting model (evaluated on a test set of 25 unseen 
commands), performed relatively well (Macro-averaged F1- 
Score of 0.849, Precision of 0.86, Recall of 0.84). Hence we 
did not deem necessary to curate more data. The Parser’s 
performance continued to prove robust during user testing. 


Knowledge Base 


The vocabulary commonly used by users can be inconsis- 
tent. For example “/&J” is commonly used when commu- 
nicating with DocuBot to create PowerPoint slides about the 
company “Johnson and Johnson”. Furthermore, the vocab- 
ulary can be context-specific. For example depending on 
each user’s intent, “graph” could mean either “Piechart” 
or “Histogram”. It is difficult to have a consistent and ex- 
haustive vocabulary mapping across all users in a large firm. 
DocuBot overcomes this limitation by having the ability to 


dynamically adapt and improve its performance through in- 
teractions with users for feedback, learning from experience. 

Similar to NELL (Mitchell et al. 2015), DocuBot can 
learn from experience by interacting with the user. It em- 
ploys a continuously learning “Knowledge Base” (KB), 
which enables a dynamic mapping between natural language 
input and skills. Given the concepts tagged by the Parser, the 
KB models them into a hierarchy that can evolve over time 
for each user. The hierarchy includes main-concepts(MC), 
sub-concepts(SC), and vocabulary. For instance the main- 
concept chart can refer to a set of sub-concepts piechart, 
barchart, etc. A sub-concept such as pzechart can itself be 
expressed using a diverse vocabulary such as pie, pregraph 
or pizzachart. The sub-concept barchart can be expressed 
as histogram or barplot. The KB maintains and updates 
mappings between main-concepts and sub-concepts as well 
as sub-concepts and vocabulary. 

We present two possible configurations for the KB, as de- 
scribed in the below sub-sections. To maintain consistency, 
the following notation holds throughout the paper: 


Definition 1 


e © M refers to a main-concept (e.g. chart). 
e o € S refers to a sub-concept (e.g. piechart, barchart). 


ewe € w refers to a word in the user’s vocabulary (e.g. 
piegraph, histogram). 


The output of the Parser is a set of tuples in the form of 
(main-concept, word) or (1, We) € O. 


Naive Knowledge Base The Naive Knowledge Base 
(NKB) aims to learn the vocabulary employed by users re- 
ferring to a concept by permanently mapping the first new 
word we € w learned from user input. After obtaining the 
Parser output, the NKB enables the mapping from O to S. 
To do so, the NKB implements the following functionalities: 
(i) isInKB: if w. exists anywhere in the KB accept, other- 
wise reject, (11) inferSC: return the o linked to the w,, and 
(111) adddToKB: add the mapping from w, toa. 

The learning mechanism is pretty simple. If a w, is re- 
jected, DocuBot asks for clarification and uses the clarify- 
ing response to link it to the proper o. For instance if the 
user asks for a “pizzachart” and DocuBot cannot find it in 
the NKB, it asks for clarification. If the user then asks for a 
“piegraph’’, the NKB may observe that the “piegraph” word 
has previously been linked to the “piechart” sub-concept, so 
it maps “pizzagraph” to the same sub-concept. In this mech- 
anism learning is possible but there is no capacity for forget- 
ting, therefore there is a risk of learning incorrect mappings. 


Robust Knowledge Base The NKB is limited in its flex- 
ibility and capacity to learn mappings of new vocabulary. 
The Robust Knowledge Base (RKB) is designed to over- 
come the limitations of the NKB. We introduce the notion 
of a Belief Score defined below: 


B(u, 0, we) = Priwe € S]. (1) 


where B(u,0,w-) is the Belief Score the Robust Knowl- 
edge Base has for a word (w,) in a sub-concept (co) belong- 
ing to a main-concept (1). 


The RKB enables the mapping from 0 to S' and has abil- 
ity to forget incorrect mappings given by malicious users. 
Instead of permanently mapping the first input, RKB main- 
tains a Belief Score of each triple p-o-w,. Every time a 
user interacts with DocuBot to choose a sub-concept, the 
RKB updates the discrete probability distribution and re- 
normalizes the score. The RKB implements the following 
functionalities: (1) isinKB: if the triplet -o-we, is written 
in the KB accept, otherwise reject, (11) addToKB: add pi-o- 
w, to the RKB with an initial Belief Score, (iil) inferSC: 
return the sub-concept that maximizes the belief score, or 
arg max,cg B(fl, 8, We), (iv) increaseBelief: increase the 
Belief Score of the triplet u-o-w-,, and (v) decreaseBelief : 
decrease the Belief Score of the triplet pi-o-we. 

If / is the number of slides created so far, every time a 
new mapping is added to the RKB, we have the following 
formulas for updating the Belief Score: 





incr i l —] 1 
B(, 0, We) Bly, 0, we) * S— + 5 (2) 
Then 
r 1 l —] 
Bu, g, We) decreaseBelief Bu, ag, We) ‘ (3) 


l 


Eq 2 and 3 ensure that a normalized score is maintained 
for each mapping. In later sections, we present experimental 
results on the NKB and RKB and their update mechanisms. 


Mapping Engine 

While the Knowledge Base keeps track of terminology, the 
Mapping engine is an algorithm to map the concepts present 
in each natural language instruction into structured actions. 
The Mapping Engine enables DocuBot to jointly map the 
concepts identified in a command to one of possible action 
scenarios a.k.a skills (described below). The mapping en- 
gine interacts with the Knowledge Base, Parser & the User- 
Interface to clarify any ambiguity in intent, so DocuBot can 
identify the corresponding available skills to use for appro- 
priate content creation or modification of digital documents. 


Skills }DocuBot has the flexibility to use various types of 
underlying data to automatically generate documents such 
as digital presentations like PowerPoints, PDFs, etc., and 
output files such as JSON requests, which are of great sig- 
nificance to business management and technology teams. 
DocuBot utilizes the predicted output concepts for map- 
ping human instructions to corresponding skills 1.e. the var- 
ious available content creation and modification capabili- 
ties. While DocuBot’s framework is widely applicable to any 
general set of skills, in this paper for demonstration purposes 
we leverage the set of skills available in the python-pptx li- 
brary (Canny 2019) for generating PowerPoint decks’. How- 
ever, internally at J.P. Morgan we have used the framework 
for non-standard visualizations such has water-fall charts, 
box-whisker plots etc., as well as diverse presentation for- 
mats. DocuBot’s skills may vary for different business use- 
cases, and similar to human-learning, DocuBot’s skills can 


>These include line-charts, bar-charts, pie-charts, and other 
standard charting capabilities. 


be continuously enhanced and new skills could be added or 
learned through experience. In this paper we discuss three 
types of DocuBot’s skills: (1) Atomic, (41) Macro and (111) 
Insights Generator Skills. The first two skills are described 
in the remainder of this section, while the last skills is de- 
scribed as a separate component in the following section. 


Atomic Skills _ refers to tasks that create or modify the con- 
tents of one slide or ‘object’ in a digital presentation from 
a single natural language input command from the user. 
The parameters of date and title in the slides as well as 
the location of data values in the data source files are auto- 
generated from templates used in reports, which are com- 
mon in business teams. Examples of natural language com- 
mands performing Atomic Skills: (1) “Please [create ]action 
a [Piechart]opject about Share Performances using [market 
daily OHLC data set]gata and include it in [’share perfor- 
mance report’ |presentation Presentation.”. 


Macro Skills can create or modify the contents of multi- 
ple slides, ‘objects’ or the entire digital presentation from 
a single natural language input command from the user, 
e.g. “Please [create ]action a [Company BriefingDeck ]opject US- 
ing [Finance]gat, data and add it in [weeklyreport] presentation 
deck.” Using this command, DocuBot creates a_ prede- 
termined template named “CompanyBriefingDeck”, of 10 
slides using data in the “Finance” file, and adds the slides 
to a PowerPoint presentation named “weeklyreport”. The 
PowerPoint slides in fig. 5b, 5c, 6b, and 6c are examples 
of DocuBot automatically generating and modifying content 
using Macro Skills. In the figures we show two slides out of 
the 10 slides deck([weeklyreport] presentation): 


The Main Impact is the re-usability enabled by this skil- 
lification. Macro Skills become a simple aggregation of 
Atomic Skills and can be repeated over many recurrent up- 
dates. Another novel contribution of DocuBot is it’s capabil- 
ity to save the previous commands of a user, encapsulating 
them as a new ‘object’. This allows it to acquire new ‘skills’ 
over time and enables end-users to re-use the new ‘skills’ for 
future presentations. 


Insights Generator 


The previously described components help DocuBot under- 
stand and normalize human input. Once human input is 
properly mapped to concepts and skills, DocuBot needs to 
generate the visualizations and commentary necessary to be 
included in the output presentation. As previously described, 
the visualizations (e.g. piecharts, histograms, etc.) are cre- 
ated based on directions from the user. But presentations that 
only include visualizations with no context and no descrip- 
tion are not useful. Each visualization needs to be accom- 
panied by commentary that summarizes it and describes the 
most important and relevant aspects of its underlying data. 
We refer to this natural-language commentary as insight. 
The Insights Generator component performs this task, using 
the following components: (1) a set of primitives that gener- 
ate insights from the raw data, (11) the mapping from insights 


to human-friendly text, (111) a novel technique for the rank- 
ing and selection of most important insights, and (iv) a novel 
capability for hierarchical analysis of insights. 


(i) Insight Generator Primitives We define primitives as 
the set of the different numerical operations that can be ap- 
plied to the underlying numerical data which is modeled as 
a time-series. Examples of primitives include: (1) Absolute 
value primitives: These primitives compute metrics on the 
raw value of the time series: minimum, maximum, rolling 
average, volatility, etc. Or, (2) Comparison primitives: These 
use access to the full history of the data to compute metrics 
about the time series and then compare the value at any slice: 
distance to the mean, comparative factor (Perera 2018), etc. 
The set of primitives is expected to grow over time. 


(ii) Text Generation from Insights Data-to-text genera- 
tion is a growing area of research in NLU and NLG (Shen 
et al. 2020; Gatt and Krahmer 2017). However performance 
is still far from human baselines, which is a major hurdle 
to deployment in enterprise settings. Furthermore, user sur- 
veys revealed that users were interested in having consistent, 
well defined, and reliable text output. Therefore we opted for 
a template-based approach. Relevant templates were curated 
as part of DocuBot’s development process, which involved 
extensive engagements with end-users. The templates were 
implemented as a sequence of interchangeable slots, e.g. 
“<company share> averaged <rate>% daily return.” *. 

This process was also useful in collecting new primitives 
that were relevant for final users. For example, in a particu- 
lar business use-case DocuBot was required to generate in- 
sights for a new Key Performance Indicator (KPI). DocuBot 
was beneficial to the firm’s executive management by high- 
lighting relevant facts about KPIs amidst disruptive events 
such as the COVID-19 pandemic. 


(iii) Insights Ranking and Selection Not all possible 
commentary is interesting or relevant to users. Addition- 
ally, in certain cases the commentary can be overwhelm- 
ing. In one case prior to the implementation of this module, 
DocuBot was generating 12,000 daily insights. 

DocuBot aims to provide the user with only the most in- 
teresting and valuable insights. For that purpose, it needs 
to rank and/or select insights. In order to sort the insights, 
we define their importance by a set of utility scoring func- 
tions such as impact on internal revenue, anomaly compared 
to previous period, and anomaly compared to peers. Each 
insight is ranked by an interpolated aggregate of all utility 
weights, and the top AK insights are displayed. The interpo- 
lation weights and K are all configurable by end-users. 


Document Generator 


As previously mentioned, for demonstrating DocuBot’s ca- 
pabilities in this paper, we use the python-pptx library 
(Canny 2019) to generate presentations. The deployed ver- 
sion of the tool supports multiple other outputs such as 
PDFs, Microsoft Word documents, and web-pages. 


“We have since experimented with PCFG-based approaches 
which will be described in upcoming publications 


Experimental Results and Discussion 


In this section, we present experiments and discuss 
DocuBot’s Knowledge Base. Further, we demonstrate the 
advantages of employing the Robust Knowledge Base 
(RKB) over the Naive Knowledge Base (NKB) in the pres- 
ence of Non-Collaborative Users. We show that the RKB 
learns new vocabulary faster and has superior performance 
in mapping user intent to output formats consistently across 
all experiments. 


Naive Knowledge Base (NKB) 
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Figure 2: NKB Learning from user interactions. DocuBot 
learns vocabulary through interaction with the several users. 
The number of interactions to create (and eventually clarify 
user’s intent) decreases due to learning across users. 


In order to evaluate the NKB’s learning ability, we tasked 
18 users to interact with DocuBot for creating 5 slides using 
natural language commands. As discussed before, the NKB 
learns a new word w, it encounters by permanently mapping 
it to a sub-concept o and does not have the ability to forget 
the mapping. NKB assumes a perfect world scenario, where 
only collaborative users interact with DocuBot. However, as 
can be expected, in the real world wrong labels can exist. 

Figure 2 shows user interactions with DocuBot as well as 
NKB’s growing memory size. In a robust setting, the expec- 
tation is that as the KB learns better representations and the 
memory grows, the k" user does not have to enter as many 
commands as the (k — 1)" user. However as the figure indi- 
cates, the number of user interactions remains volatile with 
an average of 45% decrease as the memory size consistently 
increases. Due to the limitations of the NKB, we use the 
RKB, which has the capacity to forget and re-learn proper 
representations. 


Robust Knowledge Base (RKB) 


In order to conduct a more extensive experiment with RKB, 
we simulate artificial users. This helps us to test RKB’s per- 
formance over 1000+ users with many varying parameters. It 
also allows us to conduct comparative studies between RKB 
and NKB with identical usage patterns. 


the user interaction 


We represent the set of main-concepts jz and sub-concepts 
o in our experiment using a dictionary. We also introduce 
MatchingScore as an evaluation metric to compare perfor- 
mance of both KB’s. 

We then simulate identical users in exactly the same or- 
der to compare the performance between the RKB and NKB 
using the MatchingScore metric, as well as demonstrate the 
superior performance of the RKB. We define it as: 


MatchingScore(V3, Ve) = Ss Me ie.) (4) 
i=1 
where Vz is the vector of predicted sub-concepts (@) by the 
KB, and V¢ is the vector of true (gold) sub-concepts. Note 
that |V;| = |Vc| = |Vw,|, where V,,, is the vector of words 
given by the user to create slides, . 








User Simulation To simulate a user, we generate a cor- 
pus of potential words humans would commonly use when 
interacting with DocuBot. We use Gensim (Rehtiek and 
Sojka 2010) for loading the word vectors trained on the 
Google News dataset (Google and mrt033 2019). The model 
contains 300-dimensional vectors for 3 million words and 
phrases. The phrases were obtained using a simple data- 
driven approach described in (Mikolov et al. 2013). Given a 
sub-concept a, we use Word2Vec to retrieve the ordered list 
L of the N closest neighbors w,. N can be varied from small 
to large values to account for the diversity of vocabulary em- 
ployed by a user. For each main-concept p, the simulated 
user selects a sub-concept o at random, then it picks with 
respect to a given pdf a corresponding word w,. The pdf is 
chosen from a pool of distributions modeled as aaa? - or 
—,, with n € [0, N]. These models encapsulate the diverse 
behaviours from a wider to a more targeted vocabulary. 
Learning from the NKB experimental results in fig. 2, we 
identify two types of users for simulation: (a) Collaborative 
users and (b) Non-Collaborative users. Users belonging to 
category (a) will use a w, belonging to the list L of the cor- 
rect sub-concept o. Users belonging to category (b) will use 
a w- belonging to the list L of another sub-concept. We de- 
fine a as the ratio of Collaborative users to the total number 
of users in our experiments. In the experiments reported in 
this paper, we vary the threshold between 0.4 and 1, assum- 
ing in a real-world scenario at least 407% of users are collab- 
orative. We also vary the parameter NV between 5 and 1000, 
assuming that any user would know at least 5 words. 


Experiment Parameters The experiment is repeated E 
times, resetting the NKB and RKB every time. Each ex- 
periment is divided in two phases, a learning phase and an 
evaluation phase. During the learning phase, the two KBs 
are exposed to a proportion a < 1 of Collaborative Users. 
In the evaluation phase, both KBs are exposed only to Col- 
laborative Users. For each of the experiments (both in train- 
ing and testing), the KBs are exposed to the creation of S 
slides. We average the results obtained across the E experi- 
ments and show them in Figures 3 and 4 (for these figures 
E=10 as it was a sufficient number of experiment to observe 
consistent results, and S=3000 as it was sufficient to ob- 
serve stabilization in learning). Figure 3 demonstrates that 
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Figure 3: Learning phase evolution of MatchingScore in different experimental scenarios of the creation of 3000 simulated 
slides with N = 50 and a = 0.6 for different pdf (Average of 10 simulations and smoothed over rolling window of 20). These 
experiments demonstrate the superior performance of Robust Knowledge Base (RKB), since it learns the new-vocabulary 
mappings faster, forgets incorrect mappings over-time resulting in a higher Matching Score compared to the Naive Knowledge 
Base (NKB). The smaller the variety of the users’ vocabulary, the faster the KB learns. 
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Figure 4: Evaluation phase simulation results comparing NKB and RKB. Heatmap showing the average difference of 
MatchingScore (score of RKB minus score of NKB) for the creation of 3000 slides. The MatchingScore is between 0 
and 10. The x axis corresponds to the ratio a@ (ratio of Collaborative users to the total number of users), the y axis corresponds 
to the vocabulary size of the user. These experiments show that the RKB consistently outperforms the NKB across all scenarios 
in terms of successful mapping between the user input and the text output in the document (higher Matching Score), in the 


presence of Non-Collaborative users. 


the Robust Knowledge Base (RKB) learns new vocabulary 
faster from simulated users compared to the Naive Knowl- 
edge Base (NKB). Figure 4 demonstrates that the RKB con- 
sistently outperforms the NKB in all three scenarios having 
a higher Matching Score, with varying a (which controls the 
proportion of Non-Collaborative users). 


Illustrated Interface and Output 


DocuBot has been successfully deployed to generate differ- 
ent types of financial presentations and digital reports inter- 
nally within J.P. Morgan. Investment Banking, Finance & 
Business Management, Audit and External reporting teams 
etc. are some business use-cases, which have many appli- 
cations for DocuBot. In this paper for purposes of demon- 
stration, we illustrate an example of an user interacting with 


DocuBot to generate two PowerPoint slides?. 


The end-user is able to create and modify content in Pow- 
erPoint slides by interacting with DocuBot as illustrated in 
fig. 5a and fig. 6a. The user asks DocuBot to create a ‘brief- 
ing deck’ about the company Tesla Motor. As seen in fig. 5b 
and fig. 5c, DocuBot ingests the OHLCV data (YahooF'- 
nance 2020) as input and creates two slides based on prede- 
fined templates ([ “briefing deck’ |opject) about the Share Price 
Performance and Share Volume Traded Analysis by interact- 
ing with the user. DocuBot’s main features that are demon- 
strated in this interaction are listed below:- 


*In practice, DocuBot can be used to create a large number of 
slides (20+) which is of great significance when creating Pitch- 
Books (a typical PowerPoint slide deck used for client communi- 
cations) essential to Investment Banking teams. 


between Naive and Robust KB for 1/n*2 distribution — 
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IPython console 


Q| Console 1/A zs 
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DocuBot: How can I help you today? 
User: Create a briefing deck for Tesla Motor 


DocuBot: I'm afraid that I am not familiar with "Tesla Motor" 
Can you please enter the ticker of the company? 


User: TSLA 


DocuBot: Thank you for clarifying the name of the company: Tesla 
Do you want to use Facebook, Peloton and Zoom as comparable firms? 


User: Yes 
DocuBot: Done! Do you have any other instruction for me? 
User: Run the analysis 


DocuBot: Running the following analysis: 
Template: DocuBot_OVERVIEW 

Subject Company: Tesla 

Comparable firms: Facebook, Peloton, Zoom 
Analysis Date: 2020-09-06 

Time Series Duration: 3 Months 

Type of Operation: Mean 


W 


Generating the powerpoint... 


The analysis is ready! 


(a) User - DocuBot interaction to create the deck of slides 


Tesla — Share Price Performance Analysis 


¢ 3 Months Performance of Daily Return (Weekly Mean) of 
Tesla against Zoom, Facebook and Peloton. 


Share Price Daily Return - Weekly Mean in % During the last 3 months: 

- TSLA share closed 
positive 58% of the 
time 

- TSLA averages a 1.09% 
daily return 

- TSLA largest daily 
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Automatically created by DocuBot 


(b) First page of the deck of slides 


Tesla — Share Volume Traded Analysis 


¢ 3 Months Comparison of Daily Traded Volume (Weekly 
Mean) of Tesla against Zoom, Facebook and Peloton. 


Share Volume Daily Traded - Weekly Mean 


100000000 Compared to 3 months ago: 
- TSLA share daily traded 
boas volume increased by 73.1% 


- TSLA comparative group 
40000000 volume increased by 67% 
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Automatically created by DocuBot 


(c) Second page of the deck of slides 


Figure 5: User provides a set of instructions and DocuBot creates the slide deck accordingly. 








IPython console 


Q| Console 1/A & | ie 
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DocuBot: Done! Do you have any other instruction for me? 
User: Add Apple as a comparable firm to the deck 
DocuBot: Done! Do you have any other instruction for me? 
User: Remove Peloton for the comparable firms 

DocuBot: Done! Do you have any other instruction for me? 
User: Modify the Time Series duration to 6 months 
DocuBot: Done! Do you have any other instruction for me? 
User: Compute the Median instead of the Mean 

DocuBot: Done! Do you have any other instruction for me? 
User: Run the analysis 

DocuBot: Running the following analysis: 

Template: DocuBot_OVERVIEW 

Subject Company: Tesla 

Comparable firms: Facebook, Zoom, Apple 

Analysis Date: 2020-09-06 

Time Series Duration: 6 Months 

Type of Operation: Median 

Generating the powerpoint... 


The analysis is ready! 


DocuBot: Done! Do you have any other instruction for me? 


W 


User: No 


DocuBot: Got it. Well enjoy the rest of your day and let me know if 
you need anything else! 


(a) User - DocuBot interaction to modify the deck of slides 


Tesla — Share Price Performance Analysis 


¢ 6 Months Performance of Daily Return (Weekly Median) of 
Tesla against Zoom, Facebook and Apple. 


Share Price Daily Return - Weekly Median in % 


During the last 6 months: 

- TSLA share closed 
positive 56% of the 
time 

- TSLA averages a 1.11% 
daily return 

- TSLA largest daily 
positive return is 13.5 % 

yep 4/22/2020 5/22/2020 6/22/2020 7/22/2020 8/22/2020 - TSLA largest daily 

SS ae ea negative return is 21% 





Automatically created by DocuBot 


(b) First page of the deck of slides 


Tesla — Share Volume Traded Analysis 


* 6 Months Comparison of Daily Traded Volume (Weekly 
Median) of Tesla against Zoom, Facebook and Apple. 


Share Volume Daily Traded - Weekly Median 


Compared to 6 months ago: 
- TSLA share daily traded 


200000000 volume decreased by 19.8% 
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Automatically created by DocuBot 


(c) Second page of the deck of slides 


Figure 6: User provides a second set of instructions and DocuBot modifies the slide deck accordingly. 


e The user asks DocuBot to create a ‘briefing deck’ about 
‘Tesla Motor’. However, this is not the official name of the 
company as saved in DocuBot’s knowledge base, hence 
DocuBot cannot recognize it initially. 


e DocuBot asks the user for clarification by specifying the 
ticker (unique name by which a company is publicly 
traded in financial markets) of the company. The ticker 
is used as a common key between the different words in 
the knowledge base. DocuBot can link ‘Tesla Motor’ to 
the ticker “TSLA’, linking to the company name ‘Tesla’. 


e Docubot also has a list of commonly used comparable 
firms in its parameter configuration. This is a list of firms 
that are most frequently compared by a user to measure 
financial performance. 


e User inputs the key sentence ‘Run the analysis’, which 
launches the analysis using the default set of parameters. 


We further demonstrate DocuBot’s content modification 
capabilities in fig. 6a, where the user instructs DocuBot 
to modify the content of the slides in fig. 5b and fig. 5c. 
DocuBot is tasked with modifying parameters such as the 
list of comparable firms, horizon of analysis and metric for 
performance analysis: 


e The user asks to change the comparable firms by adding 
the company ‘Apple’ and removing the company ‘Pelo- 
ton’. The new comparable list of firms is automatically 
saved by DocuBot as a parameter configuration for future 
use by this user. 


e The user asks to change time horizon of analysis from 
3 months to 6 months. The displayed analysis now takes 
into account the last 6 months of time-series data instead 
of 3 months. 


e The user asks DocuBot to use the ‘Median’ instead of 
‘Mean’ as the metric for performance analysis. The op- 
eration made on the data is now a weekly median instead 
of the mean. 


e User inputs the command ‘Run the analysis’, which re- 
generates the final output slides using the new set of pa- 
rameters shown in fig. 6b and fig. 6c. 


Internally across the firm, we have extensively tested and 
evaluated DocuBot to generate several business reports and 
presentations successfully. 


Contributions and Impact 


DocuBot is a use-proven tool at J.P. Morgan that con- 
tributes in several directions: (1) DocuBot leverages its 
learning capability across users in order to learn the partic- 
ular or domain-specific vocabulary employed by the groups 
of users. (2) The use of a Robust Knowledge Base enables 
DocuBot to correct its memory if affected by malicious 
users, this capacity to forget gives it the ability to adapt to 
the evolution of the vocabulary across time. (3) DocuBot is 
also capable of Saving and Reusing Skills enabling the users 
to define their own Macro Skills and eventually re-use them 
in order to save time. It becomes very easy to create new 
reports—the user only needs to create the new skills and 


add them to the skills library. (4) DocuBot enables a $0 cost 
what-if scenario analysis, since it is very easy to modify the 
document parameters and to get the output instantaneously. 
(5) DocuBot automatically generates text from numerical 
data. This intelligent text generation is possible through the 
use of insights ranking and selection in order to output only 
the most interesting facts. Internally at J.P. Morgan, feed- 
back from users suggested that the automation introduced 
by DocuBot to help business analysts could potentially re- 
duce their time spent in creating and updating PowerPoint 
slides from over 5 hours to less than I minute. 


Conclusion and Future Work 


In this paper, we have introduced a novel framework, 
DocuBot, to automate the generation of digital reports like 
PowerPoint slides through human-AI interaction. To provide 
an easily interpretable explanation of the data displayed on 
the slides, we have also introduced the automated generation 
of AI Insights in these presentations. We have demonstrated 
the robustness of DocuBot to adapt to different types of users 
through several experiments. As we continue to enhance 
several components of the framework, internally at J.P. Mor- 
gan, we have deployed DocuBot to demonstrate its capabil- 
ities in generating financial presentations for real world use 
cases. In addition, by applying the DocuBot framework in- 
ternally as well as through public stock market data, we have 
demonstrated the broad applicability of this emerging tech- 
nology across the industry. 

We are currently exploring a few venues to enhance 
DocuBot’s capabilities. We are experimenting with more 
flexible data-to-text generation methods involving Prob- 
abilistic Context-Free Grammars, when compared to the 
template-based approach presented here. We are also 
exploring methods to make it easier for DocuBot to scale 
to new use cases such as methods for accommodating new 
types of analysis, custom insight-scoring functions, and a 
source-agnostic data ingestion layer. 
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